Probabilistic Data Integration
نویسندگان
چکیده
In this paper we propose and experimentally evaluate a data integration approach where the uncertainty generated during the comparison and merging of the input data sources is included into the resulting mediated schema, and can be used to provide richer answers to the users. We describe a system implementing our method, and use it to empirically study the impact of uncertainty management on the effectiveness and efficiency of the data integration process. In particular, we test our approach on benchmark datasets, showing that considering uncertainty we may increase the recall of the method, and on real databases, showing that it can be applied to large data sources. . Department of Computer Science, University of Bologna, Mura A. Zamboni 7, 40127 Bologna, Italy.
منابع مشابه
Dealing with Uncertainty in Lexical Annotation
We present ALA, a tool for the automatic lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) of structured and semi-structured data sources and the discovery of probabilistic lexical relationships in a data integration environment. ALA performs automatic lexical annotation through the use of probabilistic annotations, i.e. an annotation is associated to a probability value....
متن کاملTuple Merging in Probabilistic Databases
Real-world data are often uncertain and incomplete. In probabilistic relational data models uncertainty can be modeled on two levels. First by representing the uncertain instance of a tuple by a set of possible instances and second by assigning each tuple with its degree of membership to the considered relation. To overcome incompleteness, data from multiple sources need to be combined. In orde...
متن کاملProbabilistic Data Integration Systems
Current data integration techniques are successful at managing well-defined and wellunderstood data integration tasks, but do not cope well with uncertainty. However, the amount of uncertain data is growing with the number and variety of data sources being integrated, both in traditional data integration tasks s.a. enterprise data integration, and in next generation integration problems, s.a. c...
متن کاملUncertainty in data integration systems: automatic generation of probabilistic relationships
We propose a method for the automatic discovery of probabilistic relationships in the environment of data integration systems. Dynamic data integration systems extend the architecture of current data integration systems by modeling uncertainty at their core. Our method is a probabilistic word sense disambiguation (PWSD), which allows to automatically lexically annotate (i.e. annotation w.r.t. a...
متن کاملIntegration of Probabilistic Uncertain Information
We study the problem of data integration from sources that contain probabilistic uncertain information. Data is modeled by possible-worlds with probability distribution, compactly represented in the probabilistic relation model. Integration is achieved efficiently using the extended probabilistic relation model. We study the problem of determining the probability distribution of the integration...
متن کاملAn Approach to Probabilistic Data Integration for the Semantic Web
In previous work, we have introduced probabilistic description logic programs for the Semantic Web, which combine description logics, normal programs under the answer set (resp., well-founded) semantics, and probabilistic uncertainty. In this paper, we continue this line of research. We propose an approach to probabilistic data integration for the Semantic Web that is based on probabilistic des...
متن کامل